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Abstract 

Real-time bidding has become one of the largest on¬ 
line advertising markets in the world. Today the bid 
price per ad impression is typically decided by the ex¬ 
pected value of how it can lead to a desired action event 
to the advertiser. However, this industry standard ap¬ 
proach to decide the bid price does not consider the ac¬ 
tual effect of the ad shown to the user, which should 
be measured based on the performance lift among users 
who have been or have not been exposed to a certain 
treatment of ads. In this paper, we propose a new bid¬ 
ding strategy and prove that if the bid price is decided 
based on the performance lift rather than absolute per¬ 
formance value, advertisers can actually gain more ac¬ 
tion events. We describe the modeling methodology to 
predict the performance lift and demonstrate the actual 
performance gain through blind A/B test with real ad 
campaigns. We also show that to move the demand-side 
platforms to bid based on performance lift, they should 
be rewarded based on the relative performance lift they 
contribute. 


Introduction 


pricing model as it is the most challenging problem. State- 
of-the-art DSPs that support such CPA pricing model typi¬ 
cally convert an advertiser’s CPA bid to an expected cost per 
impression (eCPM) bid in order to participate in the RTB 
auctions where the winning ad is chosen based on the high¬ 
est bid ( McAfee 2011) >. In pure second price auctions, theo¬ 
retically the optimal bidding strategy is truth-telling. There¬ 
fore, the prevalent practice to derive such eCPM bid is es¬ 
timating the Action Rate (AR) which is the probability that 
the impression will lead to a desired action and multiplying 
it by the CPA bid (i.e., eCPM=ARxCPA). 

However, such bidding strategy neglects the probability 
that a user will take the desired action even if the impression 
is not shown. For example, a loyal Pampers customer will 
make further purchases even if not exposed to any Pampers 
ad. An analogy can be found in the political marketing. As 
early as in 2012, Obama’s presidential campaign already fo¬ 
cused on the swing voters by quantifying how easily they 
can be shifted to vote for the Democrat ( Rutenberg 2013j >. It 
is surprising that the online advertising market has lagged 
behind at this point. We argue that the prevalent bidding 
strategy is categorically suboptimal by design. 


Online advertising is one of the fastest growing industries 
with $58 billion total spend projected in 2015 in US alone. 
One of the most significant trends in online advertising in 
recent years is real-time bidding (RTB), or sometimes more 
broadly referred to as programmatic buying. In RTB, adver¬ 
tisers have the ability of making decisions whether and how 
much to bid for every impression that would lead to the best 
expected outcome. It is analogous to stock exchanges in that 
data-driven algorithms are used to automatically buy and sell 
ads in real-time. The bidding algorithm can use the contex¬ 
tual and user behavioral data to select the best ads in order 
to optimize the effectiveness of online advertising. 

Demand-Side Platforms (DSPs) are thus created to help 
advertisers manage their campaigns and optimize their real¬ 
time bidding activities. DSPs offer different pricing models, 
such as cost per impression (CPV0, cost per click (CPC), 
and cost per action (CPA). This paper focuses on the CPA 
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1 Sometimes refers to cost per thousand impressions to make the 
numbers easier to manage. 


Motivating Examples 

The above bidding strategy calculates the bid price based 
on the AR (proxy for value to the advertiser) of a user. The 
problem is that campaign budget would be spent on users 
who already have high ARs instead of those who could have 
been greatly influenced by the ads (i.e., those with high AR 
lift because of the ads). The difference is best illustrated by 
the examples below. 

Example 1 (Value-based Bidding). Suppose a DSP 
is bidding on behalf of an advertiser to acquire impressions 
and the advertiser’s CPA=$100. Suppose there are two ad 
requests from user a and b respectively. The AR of user a is 
0.04 if she is shown the advertiser’s ad, otherwise the AR is 
0.03. User b has an AR of 0.02 if she is shown the ad, other¬ 
wise the AR is 0.001. 

Based on the common practice in the industry, the bid 
prices should be the absolute ARs assuming ads are shown 
times CPA. So the the bid prices for a and b are 0.04 x 
$100 = $4 and 0.02 x $100 = $2 respectively. Suppose the 
highest bid prices from other competitors are equally $3.5 









for both a and b, which means the advertiser will win the 
auction for a while lose the auction for b. In this case, the 
expected total number of actions the advertiser can have is 
0.04 + 0.001 = 0.041, and the inventory cost of the DSP is 
$3.5. Since advertiser only pays for impressions that lead to 
actions, the expected revenue of the DSP is $4. ■ 

Example 2 (Lift-based Bidding). Let us continue 
with Example [T It is not difficult to see that user b should 
be more preferable to the advertiser because the advertiser 
could expect a more significant AR lift from her (the AR lift 
of a is 0.04 - 0.03 = 0.01 while b is 0.02 - 0.001 = 0.019). 

If the bid prices the DSP places on a and b are propor¬ 
tional to the AR lifts, for instance $2 and $3.8, then the ad¬ 
vertiser will win the auction for b instead of a. In this case, 
the expected total number of actions the advertiser can have 
is 0.03 + 0.02 = 0.05 - better than that in Example [T] The 
inventory cost of the DSP is also $3.5, but the expected rev¬ 
enue of the DSP becomes $2 because the advertiser only 
pays for impressions that lead to actions. It results in a neg¬ 
ative profit to the DSP. ■ 

The above toy examples show that the advertiser even¬ 
tually gets more actions when bid based on AR lift than 
based on absolute AR. In addition, they also show that the 
advertiser’s marketing objective to maximize actions is not 
aligned with the DSP’s interest. 

Our Contribution 

In this paper, we advocate for an industry-wide transition 
from value-based bidding to lift-based bidding. That is, the 
bid price should be based on the AR lift instead of the abso¬ 
lute AR. Our contributions can be summarized as follows: 

• We propose the concept of lift-based bidding, which we 
prove both mathematically and empirically to be a bet¬ 
ter bidding strategy than value-based bidding in terms of 
maximizing advertiser benefits. 

• We describe a simple yet effective modeling methodology 
to predict AR lift. Online A/B test with real ad campaigns 
backed up our concepts and techniques. 

• We point out that the advertiser’s marketing objective is 
not aligned with the DSP’s interest. To move the DSPs to 
lift-based bidding, they should be rewarded based on the 
relative performance lift they contribute. 

Value-Based Bidding vs. Lift-Based Bidding 

In this section, we prove that lift-based bidding is a better 
strategy than value-based bidding in terms of maximizing 
advertiser benefits. However, it would be opposed by DSPs 
under the industry-standard last-touch attribution model. 

Definition 1 (AR, background AR, and AR Lift). 
Given an ad request q from a user u and an advertiser A, we 
define AR w.r.t. (q. u, A) as the probability that u will take 
the desired action defined by A after the ad of A is served 
to q, background AR w.r.t. (q. u, A) as the probability that 
u will take the desired action if the ad of A is not served to 
q, and AR lift as the difference between AR and background 


AR. We denote by p the AR, A p the AR lift, and p — A p the 
background AR if no further specification is made. m 

The common practice in the industry is to bid CPA xp. 
We generalize this practice and define value-based bidding 
as follows: 

Definition 2 (Value-Based Bidding). Letp be the 
AR of a user if the advertiser’s ad is shown, value-based 
bidding places a bid price of a x p to acquire an impression 
from this user for the advertiser, where a > 0. ■ 

However, examples in the Introduction section show that 
such bidding strategy does not necessarily optimize the over¬ 
all campaign performance for the advertiser. As we advocate 
for focusing on the AR lift instead of absolute AR, we pro¬ 
pose the concept of lift-based bidding in which the bid price 
is proportional to the AR lift. 

Definition 3 (Lift-Based Bidding). Let A p be the 
AR lift of a user if the advertiser’s ad is shown, lift-based 
bidding places a bid price of fix A p to acquire an impression 
from this user for the advertiser, where f) > 0. ■ 

In the CPA pricing model, a DSP is rewarded based on 
the number of actions attributed to it. Advertisers finally pay 
DSPs based on the amount of actions attributed to them. The 
industry standard attribution model is last-touch attribution. 

Definition 4 (Last-Touch Attribution). An ad¬ 
vertiser attributes the full credit of an observed user action 
to the DSP which delivered the last relevant ad impression 
to the user. m 

Suppose there are two DSPs DSP\ and DS If bidding on 
behalf of the same advertiser at the same time. DSP\ prac¬ 
tices value-based bidding while DSP 2 executes lift-based 
bidding. Let Ui be the user of the *-th ad request, p t be the 
AR if the advertiser’s ad is shown to u t , and Ap t be the AR 
lift because of the ad impression. To simplify the discussion, 
let us assume every ad request is from a different user and 
there are no additional candidates in the auctions. In an ad 
exchange marketplace with pure second-price auction, we 
have 

Lemma 1. DSP] wins the auction for Ui at the cost of 
j3 x Api if a x pi > fl x A pp, DSP 2 wins the auction for 
Ui at the cost of a x pi if a x pi < /3 x A pi. m 

Theorem 1. With the last-touch attribution model, 
DS I f yields more actions than DSP\ for the advertiser 
when the advertiser attributes the same amount of actions 
to them|3 

Proof. Let i be the index of all the users, j be the index 
of those users that DSP] wins (i.e., a x pj > /3 x A pj), 
and k be the index of those users that DSIf wins (i.e., 
a x pk < f) x A pk). It is straightforward to see that the 

2 The condition of DSP] and DSP 2 getting equal attributions 
from the advertiser is an important setup to illustrate that DSP 2 
will in fact produce relatively more actions for the advertiser. This 
setup condition can be achieved by adjusting the winning landscape 
through the parameters of a and fl 




expected number of actions to be attributed to DSP± and 
US 18 are . pj and Yh, Pk respectively. The expected 
number of actions if only DSP\ is considered can be de¬ 
composed as two parts: sum of the ARs of users that DSP\ 
wins, and sum of the background ARs of users that DSP\ 
loses. So it becomes ^ . pj + HpPk ~ A p k ). Similarly, the 
expected number of actions if only DSP 2 is considered is 
E j(Pj - A Pj) + T,kPk- 

Therefore, let srf\ (s'P) be the expected number of actions 
per attributed action if only DSP\ (DSP 2 ) is considered, 
we have 


si\ = 


HjPj +Efc(Pfe - A Pk) 
HjPj 


( 1 ) 


s &2 = 


E j(pj - a Pj) + E kPk 
Hik Pk 


( 2 ) 


When the same amount of actions is attributed to DSP\ 
and DSP '2 (i.e., JW Pj = E kPk )> by swapping the denom¬ 
inators and consolidating the numerators in Equation 1 and 
2, noticing a x pj > /? x A pj and a x pk < 0 x Apt, we 
have 


^ = Hi Pi - E k A Pk < Hi Pi _ « 
EfcPfc EfcPfe P 


(3) 


between attribution models and bidding strategies. We show 
that to move the DSPs from value-based bidding to lift-based 
bidding, they should be rewarded based on the relative action 
lift they contributed to the final actions. 

Lift-Based Bidding in Action 

Predicting AR Lift 

To implement lift-based bidding, it is important to estimate 
the AR lift. One may think of building a machine learning 
model to predict the lift directly. However, since the real ad 
serving logs contain only instances that an ad is either shown 
or not shown, it is theoretically impossible to have the true 
AR lift data for modeling. Therefore, in order to predict the 
AR lift, we strive to estimate both the ARs assuming the ad 
is shown or not shown respectively. 

Formally, let a be an ad, s be the state of a user at ad 
request time, and s+(a) be the state of the user if a is shown. 
Conceptually, s consists of the user’s demographic status, 
timestamped past events including page views, searches, ad 
views/clicks, and anything that describes the user state at ad 
request time. The only difference between s and s + (a) is the 
ad impression of a. Let p{action\s) be the AR of the user if 
a is not shown and p(acbion\s+(a)) be the AR if a is shown, 
the AR lift is 


£$2 — 


HiPi-Hj^Pj ^ HiPi 


HjPj 


> 


a 


Hi Pi 


(4) 


Since V pj = p k , it is obvious that s4\ < s^ 2 - 


Theorem 2. With the last-touch attribution model, 
DSP 2 costs more than DSP\ when the advertiser attributes 
the same amount of actions to them. 


Proof. Again, let i be the index of all the users, j be the user 
index such that a x pj > (3 x A pj, and k be the user index 
such that axpk < fix A p k . The expected number of actions 
to be attributed to DSP\ and DSP 2 are JA pj and p k 
respectively. The cost of US if and DSP 2 are ]A 8 x A pj 
and E k ax pk respectively. Therefore let and Pf be the 
cost per attributed action of DSIf and DSIf respectively, 
we have 


loi — 


Ej P x A pj 
HjPj 


(5) 


c* Hk ax Pk 

02 — —^- 

HkPk 


( 6 ) 


Noticing a x pj > /? x A pj, we have loi < f3 x ^ = a. 

It is apparent that r 8f> = a. Therefore r 8\ < * 


We have shown that lift-based bidding benefits the ad¬ 
vertisers but the higher cost per attributed action may un¬ 
dermine the interests of the DSPs. The root cause of this 
conflict is that the industry-standard attribution model does 
not attribute actions fairly to the DSPs. Researchers have 
pointed out that an action should be attributed to multiple 
touch points in a data driven fashion ( Shao and Li 2011 1 > or 
based on causal lift ( |Dalessandro et al. 2012b] >rin the At¬ 
tribution and Bidding section, we discuss the relationship 


A p = p(action\s + (a)) — p(action\s) (7) 

Considering a specific ad request instance in the ad serv¬ 
ing log, the ad was either shown or not shown. Therefore, ei¬ 
ther case will be absent in the modeling data. We address this 
challenge by establishing a model that has sufficient gener¬ 
alization capability. More specifically, we use a function F 
to map a state to a set of features shared among different in¬ 
stances. Then a single and generic AR prediction model P 
is built upon the derived feature set and the AR lift can be 
estimated as 

A p = P(action\F(s+(a))) — P(action\F{s)) (8) 

The difference between f 7 '(s + (a)) and F(s) is reflected 
by different feature values induced from a. At ad serving 
time when A p is to be estimated, if for instance we consider 
impression frequency of a as a feature, the feature value in 
F(s + (a)) should be greater than that in F(s) by one. 


Model Training 

Our task is to train a generic AR prediction model P to give 
AR estimations for both cases when an ad is shown or not 
shown. Existing AR prediction models in the literature are 
trained based on the post-view ([Lee et al. 2012 1 or post¬ 
click ( jRosales, Cheng, and Manavoglu 2012j l actions. That 
is, the training samples are collected from only those impres¬ 
sion events or click events. For example, in post-view action 
modeling, each impression event will trigger a training sam¬ 
ple which is labeled as positive if there is an action followed. 
However, such methodology is not preferred in our scenario 
for several reasons. 

First, since the training samples from only impression or 
click events are not representative to those cases when the 
ad is not shown, models trained upon these samples are 
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Figure 1: Less than 10% of the reported actions have prece¬ 
dent ad impressions from the same advertiser within the 
lookback window. Training samples generated from only 
impression and/or click events miss a large portion of the 
informative actions. 

not generalized enough to predict p(action\s). Second, even 
for predicting p(action\s+(a)), leveraging training samples 
from only impression or click events still suffers from sur¬ 
vival bias. In the RTB marketplace, impressions are pur¬ 
chased through public auctions. Therefore, impressions and 
clicks are available from only those winning auctions. Such 
survival bias is prevalent in click modeling. In action mod¬ 
eling, it could be avoided because actions can happen even 
there was no impression showed. Third, our observation 
from real ad campaigns shows that for the majority (usually 
>90%) of actions, we have not shown any ad of the adver¬ 
tiser before^] In other words, if training samples are only 
generated from impression or click events, the majority of 
actions (positive samples) are not leveraged for modeling. 

Therefore, we take a different approach that we train the 
AR prediction model upon the whole population. Training 
samples are generated from every user’s timeline instead of 
from merely impression/click events. To mimic the true ac¬ 
tion distribution, we first randomly select a user u weighted 
by its ad request frequency. Then a random timestamp ts is 
chosen on it’s timeline and a training sample is generated 
based on u and ts. If u has at least one action in the action 
window (denoted by ( ts , ts + aw]), the sample is labeled as 
positive and otherwise negative, where aw is the action win¬ 
dow size as long as several hours to several days based on 
business definitions. Then features are generated within the 
feature window (denoted by (ts — fw, ts]), where fw is the 
feature window size. 

Raw input variables for feature generation include user 
historical profile within the feature window with such as 
page views, ad impressions, clicks, searches and mobile app 
based events. Each user event is used as a point-in-time to 
generate features. Table |T]is a list of different types of fea¬ 
tures we generated for AR prediction. At serving time, ad 

’Advertisers usually report actions via action pixels placed on 
their websites or apps. Therefore DSPs can have the full action set. 


Source 

Feature name 

Note 

Behaviors 

IMP_FREQ_ADV 

IMP_RNC Y _ADV 

CLK_FREQ_ADV 

CLK_RNCYa\DV 

P V _FREQ_TOPIC 

PV_RNCY_TOPIC 

SRCH_FREQ_TOPIC 

SRCH_RNCY_TOPIC 

Impression and click fre¬ 
quency/recency from each 
advertiser. Page-view and 
search frequency/recency of 
each topic (pages and search 
queries are segmented into 
several semantic topics.) 

Demographics 

AGE.GROUP 

GENDER 

GEO_AREA 

Ages are mapped into sev¬ 
eral age groups. User’s ge¬ 
ographic location is at some 

moderate resolution level. 

Mobile 

INST_FREQ_APP 

INST_RNCY_APP 

USE_FREQ_APP 

USE_RNC Y _APP 

Installation and usage fre¬ 
quency/recency of each mo¬ 
bile app. 


Table 1: Features generated for AR modeling. 


request details such as geo-location, web page or mobile app 
being visited are folded in these features so that the run-time 
context can also be leveraged for prediction. For example, if 
the recency of visiting Yahoo! homepage is a feature, an ad 
request from Yahoo! homepage will set this feature value as 
“most recent”. 

The sample generation terminates when all the action 
events have been involved or the positive samples are suf¬ 
ficient. Once the training samples are gathered, we train a 
Gradient-Boosting-Decision-Tree (GBDT) model to predict 
the rank order and then calibrate using isotonic regression to 
translate a GBDT score to an AR. Please note that we utilize 
our in-house GBDT tool with distributed training capability 
for modeling; however, other proper machine learning mod¬ 
els can also be applied. 

Fitting Lift-Based Bidding in the Market 

Conventional value-based bidding calculates the bid price 
by multiplying predicted absolute AR by advertiser CPA. In 
lift-based bidding, it is not proper to simply multiply AR 
lift by the same advertiser CPA. Otherwise one can seldom 
win any auction if the majority of the other competitor DSPs 
are still practicing value-based bidding. Recall that in lift- 
based bidding the bid price is proportional to the AR lift i.e., 
/3 x A p. Our selection of j3 is 

/3 = = x CPA (9) 

A p 

where p is the population mean of AR and A p is the pop¬ 
ulation mean of AR lift. The idea is straightforward: if the 
advertiser is willing to pay CPA for each action in the con¬ 
ventional way, then each incremental action should be paid 
at the price of xCPA if only incremental actions need to 
get paid. 

Blind A/B Test with a Real DSP 

To empirically prove our proposed concepts, we set up A/B 
test experiments on Yahoo’s Demand-Side Platform. We first 












randomly split users into three equal-sized groups. Then we 
created three bidders: a passive bidder which always places 
a zero bid, a value-based bidder, and a lift-based bidder. We 
selected five advertisers to participate in the test. To be fair, 
each advertiser’s budget was evenly split and assigned to the 
value-based bidder and lift-based bidder respectively (the 
passive bidder would not spend any budget). Each adver¬ 
tiser’s campaign ran for one week and their budgets were 
all spent out which means value-based bidder and lift-based 
bidder spent the same amount of budget. We counted the 
number of action^jobserved in each group in a three weeks 
window from the campaign start date. The results shown in 
Table[2][3][4]and[5]backed up our claims and methods. 

First, by comparing passive bidder and value-based bidder 
(Table |5J, it is easy to demonstrate that a user may take ac¬ 
tions even if no impression is shown. Although showing ads 
to users did help lift the action yield, the number of actions 
when no ad is shown is already significant. This is not sur¬ 
prising because advertisers typically run campaigns through 
multiple channels such as TV, magazine, internet, etc simul¬ 
taneously. Even if a user is not shown any display ad by our 
DSP, she can still be influenced by other touch points. This 
is exactly why lift-based bidding is more preferable since it 
tries to maximize the effectiveness of display ads by taking 
background AR into account. Table[3]shows the comparison 
between passive bidder and lift-based bidder. 

Second, from advertiser’s perspective, lift-based bidder 
generated more actions than value-based bidder with the 
same amount of budget. This result is observed consis¬ 
tently among all the five advertisers (Table |4j. Since the 
background actions prevalently exist, a fairer comparison 
should be comparing their action lifts over background ac¬ 
tions, which we call lift-over-lift. Take Advertiser 1 for ex¬ 
ample, value-based bidder generated 11.2% more actions 
than the background actions while lift-based bidder yielded 
28.7% more. In this case the lift-over-lift measure is (28.7 — 
11.2)/H,2 = 156%. Lift-based bidding dramatically in¬ 
creases the incremental actions compared to value-based 
bidding so the lift-over-lift measure is very significant. 

Third, from DSP’s perspective, lift-based bidder resulted 
in more inventory cost than value-based bidder when the 
same number of actions were attributed to them. Recall that 
advertiser only pays for each attributed action at the price of 
CPA and the two bidders spent the same amount of budget. 
So the number of attributions are the same. From TableOwe 
observe the inventory cost of lift-based bidder is consistently 
higher than that of value-based bidder. 

Lastly and interestingly, we have observed increased 
number of impressions when comparing lift-based bidding 
to value-based bidding. Even though the overall inventory 
cost is higher, the effective cost per impression is lower for 
lift-based bidder than value-based bidder. The lift-based bid¬ 
der does not always compete with other bidders for those 
high AR users. Instead, it tries to acquire users who are more 
likely to be influenced. Therefore it has the advantage of 


4 Action pixel fires to be more accurate. Even in the passive 
group, there can still be actions reported via action pixel fires on 
advertiser’s website or app. 


avoiding competition and acquiring more impressions at a 
lower cost per impression. 

The above results backed up our concepts and techniques. 
Since the lift-based bidder took the risk of higher cost while 
as we have seen it actually benefited the advertisers, adver¬ 
tisers should think of a more reasonable attribution model to 
align the DSPs’ benefits with their marketing objectives. 

Attribution and Bidding 

We have proved that DSPs may be opposed to lift-based bid¬ 
ding because of higher cost per attributed action. The root 
cause is that they are not rewarded based on the action lift 
they contribute. Therefore they do not have the incentive to 
bid to maximize total actions. Actually, a rational DSP will 
always bid at the price 

eCPM = AR x CPA x p(attribution\action) (10) 

where p(attribution\action) is the probability it gets at¬ 
tributed if an action happens. The industry common practice 
to bid ARxCPA is simplifying this by assuming that the full 
credit of an action will be eventually attributed to the DSP. In 
many scenarios, such assumption is true. However, we must 
point out that such assumption is not always valid especially 
when multiple DSPs are running campaigns for the same ad¬ 
vertiser simultaneously. 

Given that the DSPs will always bid a rational eCPM 
price, we are more interested in how advertisers can move 
them to lift-based bidding, which we have shown can bring 
more actions to the advertisers. Intuitively, if the DSPs are 
attributed based on the relative AR lift they contribute to the 
final AR, they have more incentive to practice lift-based bid¬ 
ding. Therefore the key is the attribution model. 

Again, let m be the user of the *-th ad request, pi be the 
AR if the advertiser’s ad is shown to uu and A//, be the AR 
lift. Let a, = p(attribution\action, Ui) be the probability 
that the action from u% is attributed to the DSP that wins u t . 
Suppose there are two DSPs DSP\ and l)S If bidding on 
behalf of the same advertiser at the same time. DSP\ al¬ 
ways bid the rational price (i.e., CPA xpiX af) while 1)S If 
practices lift-based bidding (i.e., bid /3 x A pf). 

Theorem 3. Unless a* = x C'- e -> DSPi always 
bid the same price as I)SIf ), I)SIf yields more actions 
than DSP\ for the advertiser when the advertiser attributes 
the same amount of actions to them. 

Proof. Let i be the index of all the users, j be the index of 
those users that DSP\ wins (i.e., CPA x pj x CLj > f> X 
A pj), and k be the index of those users that DSIf wins 
(i.e., CPA x pk x ak < fix Apk). It is straightforward to see 
that the expected number of actions to be attributed to DSP\ 
and DSP 2 are Jfj p :j x a 3 and fT ). pk x ak respectively. The 
expected number of actions if only DSP\ is considered is 
Yf j Pj + Y k (Pk- A Pk)’ anc * the ex P ec ted number of actions 
if only DSP '2 is considered is Yj(Pj ~ A Pj) + J2kPk- 

Therefore, let {stff) be the expected number of actions 

per attributed action if only DSP\ (I)Sff) is considered, 




Adv 

passive bidder 

Value-based bidder 

Incremental action 

Action lift 

# imps 

# actions 

# imps 

# actions 

1 

0 

642 

53,396 

714 

72 

11.2% 

2 

0 

823 

298,333 

896 

73 

8.9% 

3 

0 

1,438 

11,048,583 

1,477 

39 

2.7% 

4 

0 

1892 

3,915,792 

2,016 

124 

6.6% 

5 

0 

5,610 

6,015,322 

6,708 

1,098 

19.6% 


Table 2: Blind A/B test on five pilot advertisers - Value-based bidder v.s. Passive bidder. 


Adv 

passive bidder 

Lift-based bidder 

Incremental action 

Action lift 

# imps 

# actions 

# imps 

# actions 

1 

0 

642 

59,703 

826 

184 

28.7% 

2 

0 

823 

431,637 

980 

157 

19.1% 

3 

0 

1,438 

11,483,360 

1509 

71 

4.9% 

4 

0 

1892 

4,368,441 

2,471 

579 

30.6% 

5 

0 

5,610 

8,770,935 

8,291 

2,681 

47.8% 


Table 3: Blind A/B test on five pilot advertisers - Lift-based bidder v.s. Passive bidder. 


Adv 

Value-based bidder 

Lift-based bidder 

Action lift 

Lift-over-lift 

# imps 

# actions 

Action lift 
(vs. passive) 

# imps 

# actions 

Action lift 
(vs. passive) 

1 

53,396 

714 

11.2% 

59,703 

826 

28.7% 

13.6% 

156% 

2 

298,333 

896 

8.9% 

431,637 

980 

19.1% 

9.4% 

115% 

3 

11,048,583 

1,477 

2.7% 

11,483,360 

1509 

4.9% 

2.2% 

82% 

4 

3,915,792 

2,016 

6.6% 

4,368,441 

2,471 

30.6% 

22.6% 

367% 

5 

6,015,322 

6,708 

19.6% 

8,770,935 

8,291 

47.8% 

23.6% 

144% 


Table 4: Lift-based bidding vs. Value-based bidding - Advertiser’s perspective. “Action lift” is the absolute # actions difference 
between lift-based bidder and value-based bidder. “Lift-over-lift” is comparing the their action lifts over passive bidder. 


Adv 

Value-based bidder 

Lift-based bidder 

Inventory- 
cost diff 

Cost-per- 
imp diff 

# imps 

# attrs 

Inventory cost 

# imps 

# attrs 

Inventory cost 

1 

53,396 

50 

$278.73 

59,703 

50 

$300.31 

7.7% 

-3.6% 

2 

298,333 

80 

$1,065.05 

431,637 

80 

$1,467.57 

37.8% 

-4.8% 

3 

11,048,583 

240 

$25,522.22 

11,483,360 

240 

$25,837.56 

1.2% 

-2.6% 

4 

3,915,792 

200 

$10,846.74 

4,368,441 

200 

$11,183.21 

3.1% 

-7.6% 

5 

6,015,322 

500 

$19,296.51 

8,770,935 

500 

$23,501.90 

21.8% 

-16.5% 


Table 5: Lift-based bidding vs. Value-based bidding - DSP’s perspective. Both bidders spent out equal amount of assigned 
budget, so the # attributions are always the same. Cost-per-impression is the inventory cost averaged by # impressions. 


we have 


srf\ = 


J2,Pj + J2 k (Pk- A Pk) 


( 11 ) 


srfl — 


( 12 ) 


Y,jPj X a j 

Ufa - A Pj) + EfcPfc 

YkPk X «fc 

If DSP\ and DSI ft are not always bidding the same 
price, we can always adjust ft to control the winning land¬ 
scape so that DPS\ and l)S I ft get the same amount of at¬ 
tribution. When the same amount of actions is attributed to 
DSPi and DSP 2 (i.e., Yj Pj x ay = Yk Pk x a k ), by swap¬ 
ping the denominators and consolidating the numerators in 
Equation 1 and 2, noticing CPA x pj x ay > ft x A pj and 
CPA x pk x a k < ft x A pk, we have 

^ = Ejgj - E k A Pk < E iPi 


YkPk X a k YkPk X a k 


srf 2 = 


E/'- E. f -V., 

E,-Pi x % 


> 


E iPi 


E,p/ x 


CPA 

P 

(13) 

CPA 

P 

(14) 


Therefore ,eyj < ,x/ 2 unless DSPi always bid the same 
price as DSP 2 i.e., a* = x m 

Theorem 4. Unless a,i = C p A x (i.e., DSPi always 
bid the same price as I)SP>), DSI ft costs more than DSP\ 
when the advertiser attributes the same amount of actions to 
them. 


Proof. The expected number of actions to be attributed to 
DSPi and DSP 2 are E jPj x % and EfePfe x afc respec¬ 
tively. The cost of DSP\ and DSP 2 are Jftj P x A p.j and 
Yk CPA x pk x Uk respectively. Therefore let '4j and %ft 
be the cost per attributed action of DSPi and DSP 2 respec¬ 
tively, we have 


Ej P x A Pj 

Yj p, x 


(15) 


^ _ Yk CPA X Pk X a k 

v r 

E/fc Pk x a k 


(16) 







Noticing CPA x pj x cij > (3 x A pj, we have < CPA. 
Since ZL = CPA, c €\ < Zj unless DSP\ always place the 
same bid as DSP 2 , i.e., ai = C p A x m 


models and technologies can be implemented in the indus¬ 
try where multiple parties such as advertisers, DSPs, ad ex¬ 
changes, and third-party verification companies are all in¬ 
volved. 


Theorem 3 and 4 suggest that for a rational DSP, the only 
way to move it to lift-based bidding is to attribute based on 
the relative lift it contributes to the final action (i.e., aj oc 
We believe by directly paying for the performance lift, 
the advertisers can move the DSPs to the more optimized 
bidding strategy. 


Related work 


Predicting AR for display ad has attracted much research 
interest. Most existing works towards this goal model ac¬ 
tions and clicks within the same framework (Leeet al. 2012; 


|Rosales, Cheng, and Manavoglu 2012}|Chapelle 2014 1 . We 

notice that none of these existing works explicitly mod¬ 
els AR lift. Although researchers are exploring alternative 
bidding strategies in order to optimize some Key Perfor¬ 
mance Indicator (KPI) under budget constraint throughout 


the campaign’s lifetime (Zhang, Yuan, and Wang 2014 
|Perlich et al. 2012| >, our work is fundamentally different 
from these bid optimization works. First, they assume a 
CPM pricing model and therefore bid optimization is un¬ 
der budget constraint, which is not a serious concern in 
CPA pricing model. Second, in their approaches, the mod¬ 
ified bids are still derived from functions that map absolute 
AR to the final bid. So they still bid based on the abso¬ 
lute AR. Several data-driven multi-touch attribution meth¬ 
ods have been proposed in recent year s (|Shao and Li 2011 [ 
|Dalessandro et al. 201 2b| |Wooff and Anderson 2014| i. Our 
focus in this paper is more on illustrating the relationship 
between bidding strategies and attribution models than on 
a specific attribution method. Budget allocation based on 
multi-touch attribution is studied in (Geyik, Saxena, and 


Dasdan 20141, which we claim can be complementary to 


our work. After budget allocation is done, our approach can 
further optimize in the individual impression level. In (Da- 
lessandro et al. 2012a|>, the authors also pointed out that 


some marketers may prefer to evaluate and optimize cam¬ 
paigns for incremental purchases. However, their focus was 
identifying better proxies than clicks to evaluate online ad¬ 
vertising effectiveness instead of innovating bidding strate¬ 
gies. 


Conclusion and Future Work 

We presented a general modeling framework to predict the 
AR lift, and apply it to drive a novel bidding strategy. We 
have proven both mathematically and empirically that the 
new bidding strategy indeed helps advertisers to improve 
campaign performance. We believe it will be an industry 
trend that advertisers will pay only for those impressions 
that drive incremental market values. This is an exciting new 
research area with many potential directions such as build¬ 
ing accurate AR lift prediction models, innovating alterna¬ 
tive CPA pricing models, and developing fairer attribution 
models. Another particularly interesting topic is how these 
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